# Preference Learning
URM LLaMa 3.1 8B
URM-LLaMa-3.1-8B is an uncertainty-aware reward model designed to enhance the alignment of large language models.
Large Language Model
U
LxzGordon
4,688
10
Llama 3 Base 8B SFT IPO
SimPO is a simple preference optimization method that eliminates the need for reference rewards, aiming to enhance model performance by simplifying the preference optimization process.
Large Language Model
Transformers

L
princeton-nlp
1,786
1
Ambersafe
Apache-2.0
AmberSafe is a safety fine-tuned instruction model based on LLM360/AmberChat, belonging to the LLM360 Pebble series, focusing on providing secure text generation capabilities.
Large Language Model
Transformers English

A
LLM360
52
7
Featured Recommended AI Models